NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LG-FAL: Locality-customized GSA Federated Active Learning

https://doi.org/10.1109/ICKG63256.2024.00056

Wang, Shuwen; Zhu, Xingquan (December 2024, IEEE)

Full Text Available
Tackling Bias, Privacy, and Scarcity in Health Data Analytics

Wang, Shuwen (October 2023, Florida Atlantic University Dissertation)

Health data analytics has emerged as a critical domain with immense potential to revolutionize healthcare delivery, disease management, and medical research. However, it is confronted by formidable challenges, including sample bias, data privacy concerns, and the cost and scarcity of labeled data. These challenges collectively impede the development of accurate and robust machine learning models for various healthcare applications, from disease diagnosis to treatment recommendations. Sample bias and specificity refer to the inherent challenges in working with health datasets that may not be representative of the broader population or may exhibit disparities in their distributions. These biases can significantly impact the generalizability and effectiveness of machine learning models in healthcare, potentially leading to suboptimal outcomes for certain patient groups. Data privacy and locality are paramount concerns in the era of digital health records and wearable devices. The need to protect sensitive patient information while still extracting valuable insights from these data sources poses a delicate balancing act. Moreover, the geographic and jurisdictional differences in data regulations further complicate the use of health data in a global context. Label cost and scarcity pertain to the often labor-intensive and expensive process of obtaining ground-truth labels for supervised learning tasks in healthcare. The limited availability of labeled data can hinder the development and deployment of machine learning models, particularly in specialized medical domains. This dissertation mainly focuses on health data analytics and explores approaches to tackle the above challenges. More specifically, the following three problems will be studied from different perspectives: (1) Sample bias and specificity in health data. (2) Data privacy and locality in health data. (3) Label cost and scarcity in health data.
more » « less
Full Text Available
FedDNA: Federated learning using dynamic node alignment

https://doi.org/10.1371/journal.pone.0288157

Wang, Shuwen; Zhu, Xingquan (July 2023, PLOS ONE)
Donta, Praveen Kumar (Ed.)
Federated Learning (FL), as a new computing framework, has received significant attentions recently due to its advantageous in preserving data privacy in training models with superb performance. During FL learning, distributed sites first learn respective parameters. A central site will consolidate learned parameters, using average or other approaches, and disseminate new weights across all sites to carryout next round of learning. The distributed parameter learning and consolidation repeat in an iterative fashion until the algorithm converges or terminates. Many FL methods exist to aggregate weights from distributed sites, but most approaches use a static node alignment approach, where nodes of distributed networks are statically assigned, in advance, to match nodes and aggregate their weights. In reality, neural networks, especially dense networks, have nontransparent roles with respect to individual nodes. Combined with random nature of the networks, static node matching often does not result in best matching between nodes across sites. In this paper, we propose, FedDNA, adynamic node alignmentfederated learning algorithm. Our theme is to find best matching nodes between different sites, and then aggregate weights of matching nodes for federated learning. For each node in a neural network, we represent its weight values as a vector, and use a distance function to find most similar nodes,i.e., nodes with the smallest distance from other sides. Because finding best matching across all sites are computationally expensive, we further design a minimum spanning tree based approach to ensure that a node from each site will have matched peers from other sites, such that the total pairwise distances across all sites are minimized. Experiments and comparisons demonstrate that FedDNA outperforms commonly used baseline, such as FedAvg, for federated learning.
more » « less
Full Text Available
Nationwide hospital admission data statistics and disease-specific 30-day readmission prediction

https://doi.org/10.1007/s13755-022-00195-7

Wang, Shuwen; Zhu, Xingquan (December 2022, Health Information Science and Systems)

Full Text Available
3D nanostructure prediction of porous carbons via gas adsorption

https://doi.org/10.1016/j.carbon.2023.118431

Vallejos-Burgos, Fernando; de Tomas, Carla; Corrente, Nicholas J.; Urita, Koki; Wang, Shuwen; Urita, Chiharu; Moriguchi, Isamu; Suarez-Martinez, Irene; Marks, Nigel; Krohn, Matthew H.; et al (November 2023, Carbon)

Full Text Available
Cyberbullying and Cyberviolence Detection: A Triangular User-Activity-Content View

Wang, Shuwen Wang; Zhu, Xingquan; Ding, Weiping; Yengejeh, Amir A. (May 2022, IEEECAA journal of automatica sinica)

Full Text Available
Deep learning data augmentation for Raman spectroscopy cancer tissue classification

https://doi.org/10.1038/s41598-021-02687-0

Wu, Man; Wang, Shuwen; Pan, Shirui; Terentis, Andrew C.; Strasswimmer, John; Zhu, Xingquan (December 2021, Scientific Reports)

Abstract Recently, Raman Spectroscopy (RS) was demonstrated to be a non-destructive way of cancer diagnosis, due to the uniqueness of RS measurements in revealing molecular biochemical changes between cancerous vs. normal tissues and cells. In order to design computational approaches for cancer detection, the quality and quantity of tissue samples for RS are important for accurate prediction. In reality, however, obtaining skin cancer samples is difficult and expensive due to privacy and other constraints. With a small number of samples, the training of the classifier is difficult, and often results in overfitting. Therefore, it is important to have more samples to better train classifiers for accurate cancer tissue classification. To overcome these limitations, this paper presents a novel generative adversarial network based skin cancer tissue classification framework. Specifically, we design a data augmentation module that employs a Generative Adversarial Network (GAN) to generate synthetic RS data resembling the training data classes. The original tissue samples and the generated data are concatenated to train classification modules. Experiments on real-world RS data demonstrate that (1) data augmentation can help improve skin cancer tissue classification accuracy, and (2) generative adversarial network can be used to generate reliable synthetic Raman spectroscopic data.
more » « less
Predictive Modeling of Hospital Readmission: Challenges and Solutions

https://doi.org/10.1109/TCBB.2021.3089682

Wang, Shuwen; Zhu, Xingquan (January 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics)
null (Ed.)
Full Text Available
Imbalanced Learning for Hospital Readmission Prediction using National Readmission Database

https://doi.org/10.1109/ICBK50248.2020.00026

Wang, Shuwen; Elkin, Magdalyn E.; Zhu, Xingquan (August 2020, Proceedings of the IEEE International Conference on Knowledge Graph (ICKG), August 9-11, 2020.)
null (Ed.)
Full Text Available

Search for: All records